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STUDY OF INTERMITTENT FIELD 
HARDWARE FAILURE DATA IN DIGITAL ELECTRONICS 
Edward J. O'Neill and James R. Halverson 
Sperry Univac 


1 . 0 Summary 

Under this contract (NASA Contract NAS 1-15574) Sperry Univac 
was asked to investigate their data recording and retrieval 
system for failures of an intermittent nature that occurred in 
field operation. Due to the nature of an intermittent problem 
and the reporting of the problem being at the discretion of the 
user, data referring to the first manifestation of an intermittent 
failure is not available. However, Sperry Univac developed a list 
failure mechanisms that could manifest themselves as intermit— 
tents. This list was used to retrieve, from the data system, 
those failures and their times that could be the final manifesta- 
tion of a previously intermittent problem. 

Three time periods were studied and probability functions were’ 
fitted and tested for goodness of fit to the data of intermittent 
and potentially intermittent failures. This was done for the 
computer and for the SSI digital microcircuit components. 

Results show that the exponential model of time to intermittent 
failure is adequate for the microcircuits. However, the Weibull 
distribution gives a slightly more accurate fit in some time 
periods. The results from the different time periods indicates 
that the failure rate for intermittents increases as the age of 
the microcircuits increases. However, it is felt that the 
f^^^ther investigation of larger time periods is necessary to 
confirm the results indicated in this study. 
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2.0 Introduction 


2.1 Introduction 

Intermittent hardware failures are known to have an important 
impact on the reliability of digital systems. However, accurate 
intermittent failure models of the type required to make realis- 
tic reliability assessments are not readily available. This 
study makes available a data base of intermittent failure in- 
formation, based on field failure data, which were classified 
by failure mechanisms and their likelihood of having been inter- 
mittent (quasi-intermittent) . 

This study will direct its attention toward actual failures that 
occurred in field-installed hardware and were introduced into our 
failure analysis cycle. This approach, while limited in the 
total population of failures, provides a new data base of quasi- 
intermittent failure data for possible application to future 
reliability assessments. 

2.2 Study Objective 

The objective of this study is to develop a data base of informa- 
tion, based on available field failure data, for intermittent 
digital hardware failures. 

2.3 Study Plan 

To meet this objective this study will i) define the problem of 
intermittent failure, ii) describe Sperry Univac ' s data recording 
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and retrieval system, iii) study the problem at the computer 
level and iv) study the problem at the micro circuit-device level. 
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3.0 Study Definition 


3.1 Intermittent Definition 

An intermittent is defined as a detected malfunction of a logic 
net which was operating properly prior to the malfunction and 
resumes normal operation in less time than the time needed to isolate 
the malfunctioning net to the lowest replaceable unit (LRU) . 

In presently deployed computers, the time to isolate is of 
critical importance; that is, the time for the maintenance tech- 
nician or the Built-in-Test (BIT) logic to find the problem and replace 
the component . 

The impact of the duration of intermittency of any given mal- 
function and its frequency are dependent on the system archi- 
tecture, software, and maintenance tools. 

In older systems, and to some extent the systems of today, the 
intermittent was always detected by the operating software. 

The maintenance technician was then called and by utilizing 
his tools, i.e., test programs, scope, VOM, etc., he was ex- 
pected to recreate the detection scenario and isolate the problem 
to some LRU. In this case, any intermittent with a duration of 
less than, say, 30 minutes, would not be isolated and would be 
declared an intermittent thus remaining in the system to cause 
trouble when it again fails. 

In some present day equipment and potentially most new equip- 
ment, the task of both intermittent malfunction detection and 
isolation will fall upon BIT. If BIT were designed to 



constantly monitor all logic nets, the detection and 
isolation of malfunction would occur almost instantaneously. 
This would mean that only malfunctions having a duration of a 
few nano-seconds would be classified as intermittent. 


The definition of the duration of an intermittent has been specific 
ally bounded by malfunction isolation time. This is due to the 
assumption that once the malfunction has been isolated and, con- 
sequently, removed from the system, the fact that the replaced 
item may once again resume normal operation is of no consequence 
to the system operation. This does, however, pose a significant 
problem for the failure analysis task. 

3.2 Constraints on the Study 

Historically, Sperry Univac has not maintained a data base of 
intermittent malfunctions. This is due to the following 
reasons: 

Most of Sperry Univac 's exposure to the system is that 
of equipment checkout. Once the equipment is running 
properly, it is delivered to the customer. The check- 
out time is but a small fraction of the total system 
life cycle and, as such, the quantity of intermittent 
failures experienced is very minute. Only with the 
advent of such activities as the 1000 hour burn-in 
testing, has the quantity of intermittents and the re- 
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porting structure been sufficient to justify the record- 
ing of intermittent malfunction data. 

The field failure reporting has been at the customers' 
discretion. The failure data reported from the field 
is made up almost exclusively of hard failures. Due 
to the complex nature of customer operational software 
and customer hardware configurations, of which Sperry 
Uni vac normally provides only the computer, it is most 
likely that intermittent failures (particularly those 
with long time between manifestations) are rarely iso- 
lated and consequently not reported in the field unless 
they become hard or their frequency increases to the point 
where they appear hard. 

Due to the lack of data on isolated intermittent failures as 
explained above, the only method of arriving at a data base 
pertaining to intermittent failures was to examine the reported 
hard failures and decide which failure mechanisms could manifest 
themselves as intermittents . This decision was arrived at by a 
joint effort by engineering personnel from the Sperry Univac 
Product Reliability Department and Failure Analysis Laboratory. 
Each failure mechanism was examined and placed in one of the 
following categories based on the best judgement of the above 
mentioned departments: 

Intermittent - A relatively high possibility of 
causing intermittent hardware failure. 
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Po^sntial Inteirini ttent — Some possibility of causing 
intermittent hardware failure. 

Hard Failure - Little possibility of causing inter- 
mittent hardware failure. 

Due to the lack of empirical data, the above failure cate- 
gorization was accomplished by engineering judgement. Confi- 
dence in this categorization will be maintained until data is 
^^^ilsble to either confirm or reject any of these judgements. 

these constraints and the data base that was 
available for this study, all reporting and analysis of 

failures in this study are on field failures after they became 
hard. 
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4.0 Description of Sperry Univac's Failure 
Reporting System 


4.1 Fail Codes 

In Sperry Univac's failure reporting system there are 174 fail 
codes used to describe the failure mechanism. These refer to 
failures of an electrical, magnetic, electro-magnetic, and 
mechanical nature. Of these 174 codes, 43 are not applicable 
to this study, 86 would be considered "hard", 28 are considered 
potentially intermittent, and 17 are considered intermittent 
according to the definition of these classes in 3.2. A brief 
description of the codes that were intermittent or potentially 
intermittent are given in Figures 1-5. 

Some contracts on individual computers call for the reporting of 
field equipment utilization, failure reporting, and failure 
analysis. The data on these computers goes into Sperry Univac's 
failure reporting system. 

4.2 Reporting Forms 

The sources of the data for this study utilized three reporting 
forms. The first is an "Equipment Utilization Report". (See 
Appendix A.l.) This report is filled out monthly for each equip- 
ment that is participating in the utilization reporting program. 
This report is used even if the equipment does not experience any 
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POTENTIALLY INTERMITTENT 


Fail 

Code Description 

lOD - Broken Weld; possible intermittent operation resulting from 
partial contact of wire to pad. 

IIF - Smeared Open Chip Bond: possible intermittent failure re- 

sulting from partial electrical contact of the lead wire to 
the bond pad or bond to adjacent metal. 

IIG - Smeared Open Post Bond: possible intermittent failure re- 

sulting from partial electrical contact of the lead wire to 
the bonding post. 

IIL - Bond Short to Metallization or Chip Edge or Mislocated: 

possible intermittent operation caused by partial shorting 
of the wire bond to metal interconnects or adjacent bond 
pads. 

12G - Interlayer Metal Short: possible intermittent operation 

resulting from partial shorting of metal interconnects (used 
for multi layer metal devices) . 

13C - Cracked Die: possible intermittent failure resulting from 

partial electrical contact of the parts of the semiconductor 
die. 

15a - Out of Spec (Elect): possible intermittent operation result- 

ing from out of specification electrical parameters; this 
is dependent upon operating design margins. 

15E - Slow Recovery: possible intermittent operation caused by 

slow reverse recovery (Trr) of diodes; this is dependent upon 
the design operating margins. 

“ Core Cracked/Defective/Noisy; possible intermittent operation 
caused by cracked/defective/noisy cores resulting in bits 
being "picked" or dropped. 


Figure 1 
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POTENTIALLY INTERMITTENT (continued) 

15G - Early Peaking Core: possible intermittent operation caused 

by an early loss of core signal output; this is dependent 
upon the design operating margins. 

20K - Timing - Delay Line Taps: possible intermittent operation 

caused by out of specification timing adjustment of delay 
line. 

21A - Delay Time: possible intermittent operation caused by out 

of specification delay time of printed circuit assemblies or 
subassemblies . 

21L - Low Output: possible intermittent operation caused by an 

output signal which does not achieve the specified output 
level. 

21M - Magnetostriction: possible intermittent operation caused by 

a change in electrical characteristics (e.g. ringing) of a 
core caused by excessive external pressure. 

22H - Not Verified, Elect cause unknown , 

22J - Not Verified, Elect plating anomolies 

22K - Not Verified, Elect restriction of wire 

22L - Not Verified, Elect scratch/abrasion 

22M - Not Verified, Elect bond 

22N - Not Verified, Elect corrosion 

22P - Not Verified, Elect substrate defects 

22Q - Not Verified, Elect nonrestrict foreign material 

Failures with the above fail codes could be considered to 
cause possible intermittent operation since a failure was 
experienced for which no cause could be determined but only 
suspected . 

23B - Noisy Bit: possible intermittent operation caused by excessive ^ 

noise, ringing, excessive recovery, or impedance mismatch of 
a core or film output signal. 
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POTENTIALLY INTERMITTENT (continued) 


23k - Weak Bit: possible intermittent operation caused by a 

narrow output pulse or an output level below that for system 
operation (;:oe 21L) . 

31A - Unverified failure 

31C - No defect found by failed item analysis 
31H - Unverified failure/suspect part replaced 
31J - Scrap-unverified failure 

Failures within the above codes could cause intermittent 
operation since a failure did exist which could not be veri- 
fied through failure isolation. 


Figure 3 
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INTERMITTENT 


lOG - Shorted Lead Wire, Poor Lead Dress: intermittent shorting 

to the edge of the die or adjacent wire bonds. 

lOL - Internal Particle or Contamination: intermittent shorting 

between die metallization stripes, bonding ponds/wires or 
edge of die to package. 

ION - Lead or Metal Migration (Grow Back) : intermittent contact 

of metal links, originally fused to create an open (logic 

"1"); this is used primarily for PROM's with fused li-.k 
technology. 

IID - Plagued Open Chip Bond: 

“ PisQued Open Post Bond: intermittent open of the chic or 

post bond resulting from the formation of "purple-plague" 
in Au— A1 intermetallic systems. 

IIH - Underbonded Chip Bond: 

IIJ - Under bonded Post Bond: intermittent open of the chip or 

post bond resulting from inadequate ultrasonic bonding 
interface in Al-Al systems. 

12B - Open Metallization Due to Microcrack: intermittent open 

of metallization stripes, primarily over ohmic steps, 
resulting from discontinuous (cracked) metallization. 

12C - Open Metal Electromigration: intermittent open metallization 

due to migration within thin areas of metal stripes caused 

primarily a combination of excessive current density/tempera- 
ture. 

15M - Pattern Sensitive: intermittent logic failure resulting from 

a particular pattern within memory causing an undesired 
change of memory bit (primarily used for RAM's). 

20B - Bent, Broken or Pushed in Pins: intermittent open contacts 

resulting from damged connector pins. 
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Figure 4 



INTERMITTENT (continued) 


20C - Cold Flow, Abraded or Damaged Wire Insulation; intermittent 
shorting resulting from damaged wire insulation causing 
shorts to adjacent connector pins, wires, terminals or ground. 

20F - Warped, Splitting, Uneven Mat Area: intermittent electrical 

failure caused by a change of magnetic core characteristics 
or core damage resulting from warped, split or uneven core 
mat . 

21G - Damaged Foil; intermittent open caused by raised or damaged 
metallic interconnects (foil) on a printed circuit card. 

23G - Disturb; intermittent logic failure' within memory resulting 
during a READ or WRITE cycle at one location causing another 
location to change states. 

30H - Reseated Cards; intermittent failure resulting from impro- 
perly or unseated printed circuit cards causing intermittent 
connection. 

31D - Intermittent/Cause Unknown: intermittent computer, assembly 

or sub-assembly failure experienced for which no specific 
cause could be established. 


Figure 5 
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failures. When a computer, which is covered by this report- 
ing system experiences a failure and the failure results in 
a repair, it is reported on either a, "Failure/Malfunctional 
Report" (FMR) , or an "Equipment Malfunction Report" (EMR) . 

See Figure A. 2 and Figure A. 4 for the format of these reports. 
Figures A. 3 and Figure A. 5 of the Appendix A give the expla- 
nation of the fields contained in the reports. When an EMR 
or FMR is filled out, the failing assembly and the form are sent 
back to the factory. The failing assembly is analyzed to deter- 
mine the cause of the failure. The information on the report 
is then entered into a data base. All of the computers using 
the utilization reporting system use the FMR or the EMR; however, 
all of the computers using the FMR or EMR do not use the equip- 
ment utilization report. Part of this study required that the 
number of computers under investigation be known for each time 
interval. This is the reason that only the 169 computers that 
are in the field utilization program were used in the distribu- 
tion analysis. 

An example of the raw failure data is given in Figure 6. This 
failure was isolated to a control memory printed circuit card 
in the field. The failure analysis laboratory determined that 
the failure was in the integrated circuit at location 16 on the 
card and that the failure mechanism within the chip was open metal 
electromigration (12C) . The FMR and EMR both contain a block 
within field 36 to explain the observed failure characteristics; 


14 



UNIVAC 


O t 'J- 'i' >AVt I 3.7 ciTh 


pAJ ?rr?r; *T r'iA* "? ' r7r-i r,^,.,. 

{ •> w‘!'v r.ix; ui i i 


1 tiVH Nt.v..;.h 

iL 401 



j ^ajCj , /?,7,y|/’< ,a.7,d> 

Kla'loimk I > . f ^ ^ 


Kla'lcimh [‘ 3CfA».i ‘*1 3j» pah! .%u*,iuLh 
- nvE 


1 36 ^^noetf K'. co*.*vl\t:: 


_l_L_!_L/ 

40 OAi^i XJ 


ScrJr^/^^ T^s/ /ee^re/^c>X 

Sa^ yps" ^373 77 'P7 7777 

/3 7 - <z>oo a / anoooo 


pi hi.?tAC£^. 

i/T. I 


CHt-CH l\/) IF YBS 

^ OlAC DETECT 
OlAG ISOLATE 
LJ LOAD FAILURE 

n heat sensitive 

CJ SHOCK SENSITIVE 

□ intcrmitte;>:t 

o 

□ 


Sc>^^ojiy:xc ^ 5 .^^ S^^ c/ /3 ^ 


iTbc 


7V/ eex..c^i 4, , nTMPR rOPISS TO FIELD ENGINEER! 



-| 1 Figure 6. Example of Field Failure Report 


« ' 


■—15 


"INTERMITTENT" is one of the possible characteristics to check in 
this block. Unfortunately, reporting in this block has been erratic 
and this block was not entered into the data base. This is 
the reason the failure mechanism was used to define which 
failures have been intermittent prior to going hard. 

Once the EMR and FMR are completed, the data is entered into 
Sperry Univac ' s reporting and retrieval system. The computer 
file has, theoretically, a field for every block of data on the 
EMR or FMR. The failure data can be sorted and ranked by the 
fields in any order that the user wants. This allows for quick 
and easy access to the specific information that the user wants. 

An example of retrieval data is given in Figure 7. 

4.3 Components 

A brief description of the components that Sperry Univac uses 
is as follows: 

1) Integrated Circuits : The integrated circuits used are pur- 

chased to Sperry Univac specifications which require process- 
ing, inspection and both screening and sample testing in 
accordance with MIL-M-38510, and MIL-STD-883 for Class B 
devices . 

2) Semiconductor Devices : The semiconductor devices used are 

purchased to Sperry Univac specifications which require pro- 
cessing, inspection and both screening and sample testing in 
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accordance with MIL-S-19500 and the applicable slash 
specs for JAN TX devices. 

3) Passive Devices : The majority of the passive devices used 

are MIL or ER equalifiers and are purchased to the appli- 
cable military specifications. 

4.4 Data Base 

Sperry Univac had four programs in the above data reporting 
system which were applicable to this study. The application of 
these programs were two shipboard, one submarine and one avionics. 
For these programs approximately 21,000 field failures were on 
file from the past five years. However, not all the failures 
in this data base were reported with the time of failure (Elapse 
Time Meter) . In addition, the reporting system is dynamic with 
computers of all age groups being included. It was decided to 
concentrate upon the one ship-board program that made up the 
majority of the failures and the population of computers in our 
data base. To address the problem of changes in the occurrence 
of failures over time, it was decided to "freeze" the data base 
into three time periods and to include a computer in the time 
period only if that computer ran throughout the entire time 
period. 

4.5 Computer Description 

The computer which yielded sufficient data for use in this study 
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is a highly reliable, ruggedized multiple-processor system 
designed by Sperry Univac for military applications. To meet 
stringent environmental and functional specifications, this 
computer was designed to meet MIL-E-16400 (ship and shore) envi- 
ronmental reguirements . Other specifications and standards used 
for design objectives are as follows; 

Radio Frequency Interference: MIL-I-16910 

Shock; MIL— S— 901 Class I Medium Weight 
Vibration: MIL-STD-167 Type I 

Salt Spray: FED-STD-151 Method 811 

Environmental Characteristics : 

Temperature Range: 

-54°C to +65°C (Operating) 

-62°C to +75*^C (Storage) 

Relative Humidity to 95% 

This computer is comprised of one or more of each of the following 
modules : 

Central Processor 
Input/Output Controller 
Memory 

Input/Output Adapters 
Power Supplies 

Witb the exception of the power supply , each module has a wire- 
wrapped back panel terminating in receptacles that mate with the 
male connectors on the printed circuit cards and memory modules . 
All heat dissipated by circuit elements is transferred to the top 
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of the card or memory assembly by thermal conduction to metallic 
"T" bars. The assembled module is closed by a heat-exchange 
cover which makes thermal contact with all "T" bars. Ambient 
air drawn through the heat exchanger by the cabinet cooling sys- 
tem removes heat to the outside. 

Man/Machine interface for maintenance actions is accomplished 
via a maintenance unit panel which provides operation controls and 
indicators which present internal computer register values needed 
to isolate printed circuit card failures. 

This computer is presently in operation in both shipboard and 
shore based applications. Due to the reporting structure com- 
prising the data base available to Sperry Uni vac, only the shore 
based computers are involved in this study. The environment of 
the study-related computers is that of normal commercial com- 
puter center operations. This implies ambient air temperatures 
of 70°F to 80°F with no shock or vibration exposure. 
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5.0 Study at the Computer Level 


5.1 Histograms 

All discussion that is to follow refers to the one computer 
discussed in Sections 4.4 and 4.5. The failure data was put 
into histograms for the following running time periods: 

10,000 hours, 5,000 hours, and 2,000 hours. These histograms 
reflect the hard failures, intermittent failures, and potential 
intermittent failures for that period. The data for the three 
time periods is based on a fixed number of computers for each 
period. The following is that relationship. 

Time Period Number of Computers 


0 - 

2,000 

hrs 

169 

0 - 

5,000 

hrs 

116 

0 - 

10,000 

hrs 

48 


These histograms are shown in Figures 8 through 16. The data 
has been screened to eliminate failures which may skew the data. 

In addition, the screening determined that if a computer had more 
than one failure, they occurred in different modules and at 
*31ffsrent times so that the failures can be assumed to be indepen- 
dent. The data represented in these histograms represents the 
first look at the computers in the reporting system. They have one 
limitation in that the failures are grouped in 250-hour blocks 
of time and that it was not possible to obtain raw data for this 
portion of the study. An interesting observation is that no 
intermittent failures were observed after 8000 hours. Appendix 
B.2 has the breakdown by time periods. 
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FIGURE 14, 


HARD FAILURES 



FIGURE 15, 








5.2 Analysis 

The failure data presented in these histograms was analyzed 
with respect to time to failure. Figure 17 lists the distri- 
butions functions for time to failure. In Appendix B.3, confi- 
dence intervals for the mean time to failure for the exponential 
distributions are given. The special form of the distribution 
for potentially intermittent failures in 0-10,000 hours (see 
Figure 14) suggests considering the time intervals 0-5000, 5000- 
8250, and 8250-10,000 separately when determining confidence 
intervals for the parameters; this is what was done in Appendix 
B.3. 

For the Weibull distribution, confidence intervals for the param- 
eters require the data to appear in ungrouped form which was not 
available. However, since the rank distribution of failures fol- 
lows a beta distribution, confidence intervals for the fraction 
of failures are possible for the Weibull cases. At each time 
listed, there is a 90% chance that the fraction of failures that 
have occurred will be between the two values given. For example: 
In the potentially intermittent failures 0-2000 hours, one would 
expect by the time of 1000 hours between 27 and 32% of the fail- 
ures to have occurred with a confidence of 90%. 

5.3 Procedure 

The first attempt in all cases was to fit an exponential distri- 
bution to the data. The estimate of the mean time to failure 
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was total test time/total number of failures. The Chisquare test 
for goodness of fit was then used. For those distributions 
where the fit was poor, a Weibull distribution fit was attempted. 
To fit the Weibull, the data was ranked. Because the data was 
grouped, it was assumed that the last failure in each time inter- 
val occurred at the endpoint of the time interval. 

The estimates of the Weibull shape and scale parameters were 
taken from the best fitted line of In (Time to failure of 
cumulative ith failure) -vs-ln In ( (1- (Cumulative ith failure-. 3)/ 
(n+.4))-l). The criterion for testing the Weibull distribution 
fit was the Kolmogorov-Smirnoff Statistic. 

For the 0-10,000 hour distributions the limitations of Chisquare 
goodness of fit test became apparent. The test is sensitive to 
the number of cells used, the expectation of each cell, the 
expectation varying from cell to cell, the sample size, and the 
testing of a continuous distribution. For the 0-10,000 hour 
intermittent, it was difficult to obtain a constant expectation 
from cell to cell or an expectation of at least 5 for the poten- 
tially intermittent failures. An alternative that is recommended 
in the literature is the Kolmogorov-Smirnoff test for goodness of 
fit. The theory has been developed, however, for ungrouped data 
and limited results are available in the literature for grouped 
data for 30 or less observations. There is a procedure to obtain 
a conservative upper bound on the Kolmogorov-Smirnoff statistic, 
when the data is already grouped. This procedure follows: 
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Let r. refer to the observed cumulative distribution value of 
he ith cell and the fitted cumulative distribution value at 

the right end point of the ith cell i = 1 , 2 , ... n Let 

Ol^serve that for each cell and every x that is 
sampled from that cell: 

F(x) - F(x) ^ F. - F._^ if F(x) > F(x) 

(1) F(x) - F(x) ^ F . - F. if F(x) > F(x) 

Hence for the ith cell: 

(2) Max Cmax((F(x) - F(x) ),(?(x)-F(x) )j < max((F. -F ) (f 

1 i-1 ' ' 

X € ith cell 

But this can be rewritten as: 

(3) Max |F(x)-F(x)j - Max( (F^-F^_^) , (F^_F . ^) ) 

X € ith 

cell 


So finally: 

(4) Dn = sup |f(x)-F(x)| 

X 


= max 

all 

cells 


(max |f(x) -F(x)|)<max(max^F i-Fi_i), (Fi-Fi_i) ) 
X C ith cell all i 


If the right hand side of (4) is less than a tables value of the 
Kolmogorov-Smirnoff statistic, then clearly is less by transi- 
tivity and the distribution would be acceptable as a good fit. 
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Figure 17 lists the fitted density functions that best describe 
the failure phenomenon of the computer. In addition we listed the 
values of the chisquare statistic and the upper bound for the 
Kolmogorov-Smirnoff statistic. The one situation where the 
Weibull and exponential fit was poor was the potentially inter- 
mittent 0-10,000 hour case. The data as seen in the histogram 
of Figure 14 suggests a multimodal distribution that repeats it- 
self after 5000 and 8250 hours. A piecewise fitting by the 
potential distribution was attempted. The parameters were cal- 
culated by the statistic mentioned above. The constants 
^1' ^2' ^3 factors used to normalize the area under the pdf 

curve to 1. They are fovind by evaluating x^/n -f f(t)dt where 

x^ is the number of failures occurring in the time period 
(tj,ti),f is the density function for that time period and n is 
the total number of failures. The fit over the full 10,000 hours 
is acceptable. 

5.4 Conclusion of Unit Study 

The computers in each time period were in a repair mode, that is 
when a computer failed it was repaired and allowed to continue 
to run. The data was screened to insure that there was independence 
between failures in the same computer. The drawback is that the 
data was only available in grouped form. Another limitation is 
that the window size of units in the 0-10,000 hour time period, 

48, was small and could lead to the pattern of failures that is 
seen in figures 14 and 15. This sample size magnitude for the 
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0-10,000 period makes the pdf's found for this period question- 
sbl®* However, this sample size and their failures represent 
all the good data that Sperry Univac had available for this 
time period at the time this study was made. 

The modeling of time to hard failures at the computer level was 
done when the data base was frozen and the failures of micro 
circuits was retrieved. The nvunbers of computers in each window 
are given in figure 18. The raw data of time to failure was 
available for this part of the study. The models of exponential 
or Weibull time to hard failure were rejected by the conventional 
tests of goodness of fit. Figure 17a summarizes the modeling 
that was done. It is seen that the MTBF is increasing as the 
age of the computer increases. 
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The pdf of fitted distribution of time to failure. 
"The X^and D values should be compared with 
the tables of Chisquare and Kolmogorov' -Smirnoff • '* 


Description 
Intermittent Failures 

. 1 

0 - 2000 hr. 8243.9 

1 

0 - 5000 hr. 11,600 

.9734 

0 -10000 hr. .973^ 

(9409) 


2^ 

exp rVs243.9) 
exp ("Vl 1.600) 
t"*0265 exp-( V9409)‘^’^^^ 


Test for Accepting pdf 

^(6) = 3.45 

^'(10) = 4.18 

ko Imogorov- Smirnoff 
D31 < .0943 


Potentially Intermittent 

.9305 exp”{ V4129.O8) 


0 - 2000 hr. 


(4129.08) 


0 - 5000 hr. 


.9305 

1.23 


T7T3 exp - (V3011.86) 


(3011.86) 


1.23 


Kolmogorov - Smirnoff 
D66 -0813 

1^101 i .1102 


0 -10000 hr. 


ai/6858.71 C^/6Q5S.ll) 0 

* t * 5000 

■a? 



‘ 8250 

6000 

®^P \ 6000 } 

5000 t t < 

03 /4941.2 

J -t-8250' 
\ 494iT2, 

) 8250 i t < 10000 

ai = 

.866905 



02 = 

.7970245 



03 = 

.730778 


X'(12) = 20 


Figure 17 


Description 


CDF 

‘mtbf 

Hard Failures 


F(+) 


0-2000 hr. 

-.0454 - 

*0523(iqo^ 

.81 

1 3663 

0-5000 hr. 

-.1403 + 


.62 

) 4860 

0-10,000 hr. 

-.1015 + 


.64 

) 5655 
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Figure 17a 



6.0 Microcircuit Failure Data 


6.1 General Information 

The data base was studied according to the three time periods — 
0-2,000, 5,000 and 10,000 hours. Due to the dynamic nature of 
the data reporting system, the numbers of computers in each 
window changed slightly from when the study at the unit level 
was made. The data base is composed of 196 computers that have 
run at least 2,000 hours. Of these computers, 139 have run at 
least 5,000 hours and 67 of these 139 have run at least 10,000 
hours. The reference to failures in this paper refers to solid 
failures that have been categorized by Sperry Univac into inter- 
potentially intermittent, and hard classes. For brevity 
in the tables, these are referred to as I, II, and III respectively 

There are 18 micro circuit types included in this study. These 
comprise all digital microcircuits of the computer of this study. 

The 18 types made up the population of 1,552,649 in the 0-2,000 
hour period, 1,131,981 in the 0-5,000 hour period, and 528,577 
in the 0-10,000 hour period. Of these 18 types, 8 types had no 
failures of any kind and were a total of 20,622 or 1 . 3 % of the 
1,552,649. For all failures, it was determined from the data base 
that no two microcircuits failed on the same card so that there 
is independence in the failures observed. The data base was also 
screened for failures that skewed the data, e.g. non-relevant 
overstress failures. Figure 18 summarizes the important information. 
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Time 

# Of 

# Of 

Most Failing 

.# Of 

# Of 

If Of 

Period 

Computers 

IC 

Digital Device 

Intermittents 

Pot . Int . 

Hard 

0-2000 

196 

1552649 

58043 

• 11 

7 

26 

0-5000 

13? 

1131981 

42249 

27 

10 

44 

0-10000 

67 

528577 

19637 

8 

8 

35 


Total of 103 failures 
Part hours ; 3.1053 

5.6599 
5.2858 


XIO^ for 0-2000 hours 
XIO? for 0-5000 hours 
XIO^ for 0-10000 hours 


Computers in the 0-2000 group with no IC Failure 
Computers in the 0-5000 group with no IC Failure 
Computers in the 0-10000 group with no IC Failure 


629,370 IC's 
277,814 IC's 
91,764 IC's 
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Figure 18. Summary of Information 



6.2 Analysis 


The probability distribution functions for time to failure are 
in Figure 19 . a comparison of the reliability functions accord- 
ing to the empirical, Weibull, and exponential distributions for 
each time period and failure type are in Appendix C. The 
criteria for determining, from the results in Appendix C.l, 
which of the two distributions, Weibull or exponential, appears 
in Figure 18 are the precision and maximum error from the 
empirical data these two distributions have. For example, the Weibull 
distribution for the 0-5,000 II group has -a maximum error of 1.2 
failures while the exponential has a maximum error of four fail- 
ures. For the 0-2,000 I, 0-5,000 III and 0-10,000 III, the 
Weibull distribution has a greater maximum error than the exponen- 
tial.' However, this error occurs towards the end of the time 
periods and the Weibull gives a consistently better fit than the 
exponential. Therefore, the Weibull distribution was used in 
Table C. 


The Appendix C.l suggests that the rate of change of intermittent 
failure rates is increasing while the rate of change of the hard 
failure rate is decreasing. One reason that the failure rate for 
0-10,000 I is increasing is because the first failure occurs 
after 1,000 hours and all occur within the next 3,600 hours. 

This contrasts with the earlier time periods that had failures 
observed as early as 100 hours. Additional data would be neces- 
sary for the 0-10,000 time period to determine if the failure 
rate of solid intermittents is increasing. Appendix C.2 gives 
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EMLURE TYPE TIME PERIOD DISTKIBUTTON fUNCriON F(t) 


DISTOIBUTION 
FUNCTION TYPE 


HAZARD FUNCTION 


mTF (HR) 


I 

I 

I 

II 

II 


II 


III 

III 

III 


. 60 ^ 

0-2, 000 l-cxp«-p- j 

\9.254 XIO^^ / 


Wcibull 


.603 t 


-.397 


(9.'254 XlO^b 


0-5,000 ^■®’^"(209,626,lll) 


Exponential 4.8 XIO 


0-10,000 

/ ^ \1.6017 

l-«>q?-[455 5450 ] 

0-2,000 

l-exp-(g^^228 XIO^®) 

0-5,000 

1-exp-/ 

M.8728 XIO / 

0-10,000 

l-€xp-( ^ g) 

\5.332 XIO / 

0-2,000 

l^xp-l b] 

VI. 1943 X 10/ 

0-5,000 

1-exD-/ t V.964 

\153 398 200 j 

0-10,000 

l-exD“/ t \ .8451 

l619 088 092 ) 


Weihull 


Vteibull 


Vfeibull 


1.6017 t 
(455 5450) 

.7 t 

(6.4228 XIO)*^ 

.6179 

(4.8728 


Vfeibull 


.8111 t 


-.1889 


(5.332 XIO*^)*®^^^ 


Exponential 


8.4 XIO 


-9 


Vfcibull 


Welbull 


.964 t 


-.036 


(153 398 200) 
.8451 

(619 088 092) 


.964 


.8451 


1.384 XIO^^ 

2.096 XIO® 
4.08 XIO® 

8.13 XIO^® 

7.3325 XIO^^ 

5.98 XIO^ 
1.1943 XIO® 
1.56 XIO® 

6.76 XIO® 


Figure 19. Summary of Predicted Distribution 



the breakdown for the data in C.l by vendor and C.3 has the 
breakdown by module function of quantities of micro circuits 
and failures. The vendor-failure relationship is not very 
strong. However, the function of input/output control has the 
most failures for all three failure categories. There is also 
a positive correlation of quantities of integrated circuits 
with quantities of failures. 

20 has the calculations for the confidence intervals for 
the parameter of the exponential pdf . To determine confidence 
intervals for the parameters of the Weibull pdf is exceedingly 
more difficult. The procedure to follow could be that described 
by J. F. Lawless in the November 1978 issue of Technometrics. 

6.3 Discussion of Significance 

The procedure followed is typical of most studies of this nature. 
Screening was performed to get good independent data. With a 
type one error of .05, i.e., a significance level of .95, all 
time periods with the exclusion of 0-10,000 hr III (hard failures) 
would have the hypothesis of exponential pdf's accepted. The 
goodness of fit test that was used is that of Gnendenko which is 
the most powerful test for exponential ity for censored samples. 

A goodness of fit in censored samples for the Weibull pdf using 
the suggestions of Michael and Schucany of November 1979, 
Technometrics was done. For all cases, except 0-2,000 III and 
0-10,000 III, the type one error is greater than .2 for reject- 
the Weibull hypothesis. For 0 — 5,000 III the error would be 
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Figure 20 . 95 % Confidence Intervals for Exponential pdf of Study 
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.05 and for 10,000 III it would be .01. Thus, there is good 
reason for accepting the Weibull pdf's. 

The literature does not consider the problem of confidence when 
the magnitude of the population and failures observed are the 
size of our study. One could put little faith in the study if 
one considered the quantity failed — population size ratio of 
out study, 6.6 x 10“^, as unrepresentative of a mortality study. 

On the other hand, the most pessimistic MTTF determined in the 
study suggests we should have to wait 328 years to have 28 % of 
the devices fail. Another factor that mades the conclusions of 
this study doubtful is the evolving state of the art. The micro- 
circuits are constantly improving in reliability. The computer 
we ship today has a MTBF that is greatly improved over the "same" 
computer that was used in this study. Taking all of these 
factors into account, this study reflects the current state of the 
art of SSI digital devices operating in the field. 
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7.0 Conclusions 


An intermittent failure is a detected malfunction of a logic net 
which resiames normal operation prior to the time needed to isolate 
the malfunctioning device. Due to the impracticability of having 
Sperry Univac's customers record the manifestation of intermit- 
tent failures, the phenomenon is not currently in Sperry Univac's 
reporting system. However, Sperry Uni vac engineers have deter- 
mined which failure mechanism could be intermittent before they 
go hard and are reported. 

To study the failure phenomenon three time periods — 0-2000, 

0-5,000, and 0-10,000 hour, were established, and three non- 
exclusive groups of computers were determined. These 
computers were in a repair environment. Failures in the study 
at the computer level included non-microcircuit devices. 

The best fitting distributions of time between .intermittent 
failures are exponential for the 2000 and 5000 hour time periods. 

The distribution is Weibull for the 10,000 hour time period. 

The confidence intervals for MTBF indicate that the MTBF increases 
as the time period increases. This suggests that the occurence of 
intermittent failures is more frequent in the early life of the 
computer. The potential intermittent failure class shows a 
Weibull distribution of time between failure in the 2000 and 
5000 hour time period. The failure rate for potential intermittents 
is increasing as the life of the computer increases. The 
10,000 hour time period appears to have a trimodal distribution • 
for potentially intermittent failures- This may be due to 
the small number of computers in this time period. 
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For the same three time periods, a study of the digital micro 
circuits of the selected computer was made. The best fitting dis- 
tributions for time to intermittent failure indicates that the rate 
of change of the intermittent failures rate is increasing. This 
means that a digital microcircuit is more likely to experience 
intermittent failure as the circuit gets older. The potential 
intermittent failure class shows a Weibull distribution of time 
to failure with a decreasing failure rate in all three time periods. 
The hard failure class shows the opposite phenomenon of the inter- 
mittent failure class. The rate of change of the hard failure rate 
is decreasing, which means the microcircuits are less likely to 
experience hard failures as they get older . These results apply 
up to an age of 10,000 hours for the selected computer and the 
microcircuits. 

The data for the digital microcircuits occur in Type I censored 
form. Methods that are discussed in the literature regarding 
goodness of fit for censored samples were used. For all failure 
classes and time periods, with the exception of the 0-10,000 hour 
hard failure case, either the exponential or the Weibull distribution 
could be used as models. The final list of pdf's in Figure 18 is 
based upon an examination of Appendix C for the precision and 
accuracy of the goodness of fit. There is a positive correlation 
(.84) between the number of microcircuits in a module and the 
number of intermittent failures that module type has. The distri- 
bution of microcircuit intermittent and potentially intermittent 
failure, according to vendor- is not uniform. One vendor who supplied 
1.6 percent of the microcircuit population had 66.7 percent of the 
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intermittent failures. 


It is questionable whether the available data reqarding the 
nvimber of failures, the time periods, and the populations of 
microcircuits are adequate to establish accurate predictability. 
After 10,000 hours, only .0096 percent of the population to 
this time period have experienced a failure. It would 
require, based upon the highest failure rate found, 29.9 
years to have one percent of this population fail. 
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APPENDIX A.l 


A 


€tp^3f^u->uNIVAC 


EQUIPMENT UTILIZATION REPORT 



SIGNIFICANT EVENTS AND aOw )MPLISHMENTS DURING REPORT PERIOD: 


UNRESOLVED PROBLEMS: 


NOTES: 


OPERATIONAL STATUS: 

□ OPERATIONAL □ LIMITED □ INACTIVE □ TRANSFERRED □ STORAGE □ DOWN 

DISTRIBUTION: U white - FIELD ENG. TECH. SUPPORT GROUP 2. yellow - QUALITY PROGRAM GROUP 

3. pink - FIELD ENG. MTC. SUPPORT GROUP 4. g'rod - ORIGINATOR 


UDl-3713 
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sperryH^univac APPENDIX A. 2 Iifmrno. 

FAILURE/MALFUNCTION REPORT D 53689 0 


13 TYPE 

FAILURE raADJUSr[T] irTTERMIT 
Q 0PRI M MAINT 0 WEAR 

UJ [U SEC jg BAD SP Q 


26 COMPONENT PART NUMBER 


I I I I I 



39 ADDITIONAL DATA 

1 1 1 1 1 1 I ( 1 I 1 I 1 1 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 1 1 1 1 11 

33 LOCATION OR SITE 

1 1 1 1 1 1 1 1 1 ( [ LA 

38 CONTRACT NUMBER 

1 1 1 1 1 1 1 1 1 L I 1 1 1 1.1.. 

employee NAME 

1 1 1 1 1 1 1 1 


I I I I I I I I 


CHECK (>/) IF YES 

□ DIAG DETECT 

□ DIAG ISOLATE 

□ LOAD FAILURE 

□ HEAT SENSITIVE 

□ SHOCK SENSITIVE 

□ INTERMITTENT 

□ FAILURE VERIFIED 

□ SPARE AVAILABLE 

□ 

□ 

□ 

□ 

□ 


(ORIGINATOR - DO NOT WRITE BELOW THIS LINE) 















APPENDIX A. 3 


:ws* J %-#f 


-^>UNI\/AC 

- COMPUTER SYSTEMS 


FAILURE/MALFUNCTION REPORT 

UDI-3180 (Rev. 8/76) 


Originate the F3^1ure/^f3lfunctlo^ Report (FMR) for each repair action, failure, or malfunction that Involves a part, sub-assembly, chassis or 
aqitfprnent. It is the responsibility of the person who makes the repair, replacement or discovers a malfunction to origir.ate the 
AT three sections: (1) WHAT FAILED — describps what failed, where, when and who originated the form; (21 

< rouble - details what happened and what was done to correct the problem; and (3) FAULT-FAILURE/REPAIR 
ANALYSIS — describes the mode and cause of the failure. 

TI^ originator fills In the first two sections. Print the data using ball point pen to make the data clear on all carbon copies. Retain the golden- 
rod copy of the form and forward the FMR (3 copies or 2 if Customer Rep, copy is pulled on site) as listed: 


NO PART INVOLVED 
MAILING ADDRESS 

SPERRY UNIVAC DSD, FIELD ENGR,. MS M2A01 
P.O. BOX 3525 
ST. PAUL, MfNN. 55165 


FMR WITH PART 
SHIPPING ADDRESS 

SPERRY UNIVAC DSD, RETURNED GOODS CRIB 
2750 WEST 7TH BLVD. 

ST. PAUL, MINN. 55116 


DETAILED INSTRUCTIONS FOR ORIGINATING THE FMR FORM 


M) .WHAT FAILED SECTION 

Place the hardpaper flap below the FMR set being filled cut to prevent spoiling the sets below. Enter each digit clearly In the allotted 
space, as this data goes directly to Computer Data Bank via Scooe Input. Identify the letter I as 'T'; numberic 1 as "1"; the lecter O as 
"0' ; rwmberic zero as "O"; the letter S as "S"; the letter U as "UL"; the letter Z as the letter J as "J"; and capitalize all other letters. 
Do nor enter any more letters or digits than a block allows or use any codes not authorized by this procedure. Enter only the data in each 
information is available using the codes contained in the code tables. If a code is not available, enter the Information in 
Block 36. Enter In these blocks: 


Block No. Block Title Explanation 


6, 

FROJ, 

Project code (See codes - Block 
6). 

7 

FAIL DATE 

Date failure was detected. 

8 

CABINET TYPE 

Sperry Univac cabinet type 
number. 

9 

. CABS/N 

Sperry Univac Manufacturing ser- 
ial number. 

10 

E.T.M. (HOURS) 

Elapsed Time Meter reading to 
nearest hour. 

11 

REPAIR MIN 

Time to repair In minutes — Iso- 
late, repair and verify — no 
logistics 

12 

TYPE REPORT 

Type of repoa code (See codes if 
not preprinted.) 

13 

TYPE OF FAILURE 

Check applicable block. 

14 

CHASSIS TYPE 

Chassis type designation or code 
(See codes — Block 14). 

15 

CHASSIS S/N 

Sperry Univac Manufacturing ser- 
ial number. 

16 

SU8A TYPE 

Subassembly type code (See 
codes — Block 16). 

17 

SUBA S/N 

Sperry Univac Manufacturing ser- 
ial number. 

18 

SUBA REF. 

Reference designation (position) 
from which failed subassembly 
was removed. 

19 

SUBASSEMBLY 
PART NUMBER 

Sperry Univac part number of 
failed subassembly. 

20 

DASH 

Sperry Univac dash number of 
failed subassembly. 


Block N a. Block Title Explanation 


21 

REPLS/N 

Serial Number of the replacement 
item. 

22 

REF DOC NO. 

Associated FMR/FR/FCO/EIR, 
Etc. 

24 

COMP. TYPE 

Component or part type code 
when component or part Is 
removed (See codes — Block 24). 

25 

COMP. REF. 

Component or part references 
designation (position) of failed 
component or part. 

26 

COMPONENT 

Sperry Univac part number of 
failed component or part when 
component or part Is removed. 

27 

DASH 

Sperry Univac dash number of 
failed component or part. 

28 

VEND. CODE 

Fill In vendor name In Block 36 
If applicable. 

29 

DATE CODE 

Vendor date code as applicable 
when components or parts are 
removed. 

32 

REPORTED BY 

Employee number of person 
originating FMR. 

33 

LOCATION OR 
SITE 

Name of location or site where 
failure occurred. 

38 

CONTRACT 

NUMBER 

EMPLOYEE 

NAME 

Contract number covering unit on 
which maintenance or testing is 
performed. 

initials and last name of person 
originating FMR. 


NOTES: 1. When Manufacturing serial or type numbers are not available, enter customer nomenclature and serial number in Block 36. 

2. Originator does not make entries in Blocks 3, 4, 5, 28, 30, 31, and 35. Make entries in Blocks 24 through 29 only when repairs 
occur at the component/part level. 


(2) WHAT WASTHE TROUBLE SECTION 

A. Failurri ,0«cription — Fill in a brief description of the symptoms of failure, operation routine, test and debugging procedure, errors 
noted, w sther failure conditions observed. Give sufficient facts about the failure to adequately reconstruct the failure conditions 
for eacf> le*-el of assembly. 

B. Action 7akm — Fill in what was done to isolate this failure/malfunction and to repace or adjust the equipment to remove the prob- 
lem, Trouble shooting notes such as switching of subassemblies, running diagnostic routines, testing for open or shorted pins, etc., 
are extremely /re*'»oful. 

. C. Effect of Action — Fill in what tests were run following the replacement of a failed part indicating the equipment Is again opera- 
■1 tional. Also note p^art or assembly disposition, e.g., scrap, returned for analysis and/or repair with FMR. 

O. Maintenance Problems — Enter problems which were encountered during this maintenance action. Notes as to availability of spares, 
replacements, damage, inadequate tools, and troubles in disassembling are helpful for future design considerations. 



Appendix A. 4. Field Failure Report Forms 
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EQUIPMENT MALFUNCTION REPORT 



FAILURE ANALYSIS DATA 



MA 06 »TO«AiraATA^ I I I . 
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DETAILED INGTRUCTION'S 


APPENDIX A. 5 
EMR INSTRUCTIONS 


COMPLETE ALL OPEN BLOCKS AS FOLLOWS (FRONT SIDE ONLY) 


BLK. NO. BLK TYPE 

0 EQUIP NAME 

1 EMR NO. 

3 PR 

5 SITE CODE 

7 FAIL DATE 

9 EQUIPMENT S/N 

10 ETM HRS 

11 REPAIR TIME 

12 REPAIR TYPE 

13 TYPE OF FAILURE 

14 MOD/CHASSIS TYPE 


15 MOD/CKASSIS S/N 

1G . SUBA TYPE 

17 SUBASSY S/N 

18 SU3ASSY REF 

19 SUBASSY PART NO. 

20 DASH 

21 REPLACEMENT S/N 

22 RELATED EMR NO. 

32 NAME- 

33 SITE/LOCATION 

36 PROBLEM 

..COMMENTS 


38 PART TYPE 

39 PART NO. 

40 DASH NO. 


EXPLANATION 

Enter the ecjuipmsnt name. e.g.. UYK-20, UYK-7, C.'»201 etc 
Equipment Malfunction R-iport Mu.mb'jr. 

Part Returned Y for YES, .N for NO. 

Site Code Ent^ the unique number for each site v.-hidi can be obtained from 
Sperry Univac, ST? 

Enter the date of the failure. 

nameplat-* Number - Enter the complete customer serial number from the equipment 

~ It',®/ readings of the module in v/hich the failure 

occu.rcd. If there is only one ETM. enter that reading, e.a, UYK-20 

Enter the actual tirr.c to effect the repair, in minutes, NOT inc'dino oarts acquisition. 

tne blocks. EM - Emercency Main:.. PM - Pie-;enti-/e Maint., fC - Field 
Change Order, 1C — Installation and Checkout. 

SEC - Secondary Failure. 

Omn. “ Input/Output Contrclls.', lO.A - Input/ 

Output /^ptcr. ‘•lE.vi — Memory, DDi.1 — Double Density Memory, CAB - Cabinet TS — 

-Maintenance Panel, ROCU — Remote Operator Console Unit 
Chassis Serial Number - Enter the serial number of the modulo the failure occurred in 
enter subassembly typo, i.c., PC. 

Enter Serial Numbar of the failed SUBASSY. 

Enter the location of the failed SU3ASSY, e.g., J.32C. 

12:“'’ !il® V" number for the failed major assembly not including dash number. 

Enter the dash nu.mber of the failed item in Block 19. 

Enter the replacement Part/Assy Serial No. 

Use only if secondary failure. 

Name of the individual making this report. If feedback is desired include moiling address on 
tne reverse side of the first copy. 

Enter the ncme of the site and ceograohic location. 

Use this space for a narrative description of the failure to Include problem description how 

♦hf E-Tif corrective action, and any difficulties in isolating 

tl^ malfunctiori. If PCO or ECP installation. Include the change type and number or any 
other unusual circumstances, - oi.y 

Enter part type, e.g., RES for Resistor, 1C for Integrated Circuit. 

Enter tM part number of the failed part (component) entered in Block 38 
Enter a 3KJigit dash number. 
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FAILURE DATA 


Intermittent 

0^50 00 0^10000 


Potentic 

0-2000 


250 5 

; 2 2 ' 12 

500 4 

1 2 ; 2 : 11 

750 4 

r 3 : 0 i 6 

1000 6 

3 ' 0 ^ 16 

1250 4 

1 . 1 i 1 ! 6 

1500 6 

1 5 1 3 

1750 7 

12 0 i 5 

2000 5 

2 3^7 

2250 

2 0 

2500 

2 1 

2750 

4 2 

3000 

3 2 1 

3250 

2 0 

3500 

2 , 1 ; 

3750 

2 ' 0 i 

4000 

2 2 i 

4250 

4 . Of 

4500 

2 ; 1 i 

4750 

4 1 11 

5000 

1 i 1 

— 5'25U 

1 2 1 

5500 

• 2 . 

5750 

' *0 ! 

6000 

^ 0 

6250 ; 

i ' ^ i 

6500 I 

[0 

6750 ! 

! ! 1 

7000 1 

! ; 2 ! 

7250 

2 

7500 i 

i i 1 

7750 i 

i i 0 

8000 

' ! 1 

8250 ; 

0 

8500 . 

0 

8750 

0 

9000 

0 

9250 

0 

9500 

0 

9750 

0 

10000 

.1 

0 
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APPENDIX B.2 

FAILURE TOTALS BY CATEGORY 



0 - 2000 

2000 - 5000 

5000 - 10000 

Total 

Intermittent 

41 

30 

11 

82 

Potentially 

Intermittent 

66 

51 

43 

160 

Hard 

148 

142 

89 

379 

Totals 

255 

223 

143 

621 


i 


cn 

ui 



APPENDIX B.3 

Confidence Intervals of Parameters of Table 16 


Description 90t C.T. 95n C.I. 

Tntermitte.nt Failures (6287 , 11193) (6014, 11829) 

0-2000 hour 


Intermittent Failures 
0-5000 hour 


(9184, 14885) 


(8767, 15630) 


Potentially Intermittent 
(0-5000 hr) 


(4873, 6830) 


(4778, 7070) 


Potentially Intermittent 
0—10,000 hours 


0- 5,000 hours 

(5302, 

9277) 

* 


(5051, 9846) 

5000 - 8250 

(4478, 

8562) 



(4227, 9186) 

8250 - 10000 

(3373, 

7479) ■ 



(3157, 8172) 

Potentially Intermittent 

n = 169 





0 - 2000 hr. 






Time 

55! Rank 



9555 Rank 

250 

.0408 



.119 


500 

.0947 



.1807 


750 

.1234 


• 

.2208 


1000 

.2105 



.3229 


1250 

.2245 



.3622 


1500 

.2578 



.3815 


1750 

.288 



.413 


2000 

.3263 



.4567 



k 



i 



I 
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nittent Failures 

00 

II 

a 

000 

5* 

Time 


250 

.01 

500 

.03 

750 . 

.03 

1000 

.03 

12 W 

.04 

1500 

.06 

1750 

.06 

2000 

. 1 

2250 

. 1 

2500 

,12 

2750 

.15 

3000 

.19 

3250 

.19 

3750 

.20 

4000 

.24 

4250 . 

,24 

4500 

.26 

4750 

.27 

5000 

.29 

5250 

.33 

5500 

.37 

5750 

.37 

6000 

.37 

6250 

.37 

6500 

.37 

6750 

.39 

7000 

.43 

7250 

.47 . 

7500 

.49 

8000 

. .51 

8250 

.51 

8500 

.51 

8750 

.51 

9000 

.51 

9250 

.51 

9500 

.51 

9750 

.51 

10,000 

.51 


V5S: Rank 

.09 
.15 
.15 
.15 
.18 
.21 
.21 
.28 
.28 
. 3 

.35 
.39 
.39 
.42 
.46 
.46 
.48 
. 5 
.52 
.56 
.61 
.61 
.61 
.61 
.61 
.62 
.66 
. 7 
.72 
.74 
.74 
.74 
.74 
.74 
.74 
.74 
.74 
.74 


! 
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TIME 

FAIL 

100 

289 

497 

616 

1017 

1375 

1450 

1843 

1938 

1983 


100 
289 
455 
- 497 
616 
1017 
1375 
1843 
1938 
1983 
2268 
2520 
2622 
2799 
2802 
3165 
3428 
3911 
3949 
4037 
4142 
4561 
4603 
4835 
4840 
4992 


1017 

1938 

1983 

2622 

2799 

3911 

3949 

4603 


♦ APPENDIX C.l 
COMPARISON OF DISTRIBUTONS 


OBSV. 

RELIABILITY 

WEIBULL 

RELIAHILI' 

.99999871 

.99999901 

806 

814 

742 

742 

678 

706 

613 

603 

549 

524' 

484 

508 

420 

432 

355 

414 

291 

406 

.99999823 

.99999881 

734 

744 

646 

644 

558 

620 

469 

556 

381 

361 

293 

204 

204 

016 

116 

.99998979 

028 

962 

.99998939 

855 

851 

764 

763 

728 

674 

666 

586 

665 

498 

542 

409 

455 

321 

299 

233 

287 

144 

260 

056 

227 

•99997968 

098 

879 

086 

791 

016 

703 

015 

761 

.99997969 

.99999810 

.99999858 

621 

601 

432 

586 

243 

353 

054 

282 

99998864 

.99998773 

675 

753 

486 

407 


0-2000 I 

EXPONENTIAL C3SV. 
RELIASILITY SURVIVORS 


.99999964 

1552647 

897 

46 

823 

45 

781 

44 

639 

43 

512 

42 

486 

41 

347 

40 

313 

39 

297 

38 

)-5000 I 


.99999952 

1131979 

862 

78 

782 

77 

762 

76 

706 

75 

514 

74 

344 

73 

120 

72 

075 

71 

054 

70 

.99998918 

69 

797 

68 

749 

67 

664 

66 

663 

65 

490 

64 

364 

63 

134 

62 

116 

61 

074 

60 

024 

59 

,99997824 

58 

804 

57 

693 

56 

691 

55 

618 

54 

.10000 I 


99999846 

528576 

706 

75 

699 

74 

603 

73 

576 

72 

408 

71 

402 

70 

303 

69 


WEIBULL 

SURVIVORS 

expcne::tia: 

SURVIVORS 

1552647.4 

155264S.4 

46.1 

47.4 

44.9 

46.3 

44.4 

• 45.6 

42.8 

43.4 

41.6 

41.4 

41.3 

41 

40.1 

38.8 

39.9 

38.3 

39.7 

38 


1131979.6 

11319S0. 4 

78.1 

79.4 

76.9 

78.5 

76.7 

78.3 

75.9 

77.6 

73.7 

75.5 

72 

73.5 

69.8 

71,0 

69.4 

70.5 

69.2 

70.2 

68 

68.7 

67 

67.3 

66.6 

66.8 

65.9 

65.8 

65.8 

65.8 

64.5 

63.9 

63.5 

62.4 

61.7 

59.8 

61.6 

59.6 

61.3 

59.2 

60.9 

58.6 

59.4 

56.3 

59.3 

56.1 

58.5 

54.8 

58.5 

54.8 

58 

54 

528576.2 

76.1 

74.9 

75.4 

74.8 

75.4 

73.5 

74.9 

73.2 

74.7 

70.5 

73.8 

70.4 

73.8 

63.5 

73.3 



0-2000 II 


TIKE 

FAIL 

186 

195 

291 

919 

952 

1698 


186 

195 

291 

919 

952 

1698 

2729 

2730 
3149 


291 

952 

1698 

2729 

2730 
3149 
5320 
5960 


OBSV. 

RELIABILITY 

. 99999935 
806 
742 
677 
613 
549 


.99999911 

734 

646 

558 

469 

381 

293 

204 

116 


. 99999810 
621 
432 
243 
054 

.99998864 

675 

486 


WEIBULL exponential 

RELIABILITY RELIABILITY 

OBSV, 

SURVIVORS 

1552648 

46 

45 

44 

43 

42 

V7EIBULL 

SURVIVORS 

1552647.3 

47.3 

46.7 
43.9 

43.8 
41.2 

. 99999894 
890 
855 
677 
669 
503 

.99999964 

962 

943 

822 

816 

671 

.99999848 

844 

800 

593 

584 

405 

203 

203 

129 

0-5000 II 
.99999967 
965 
485 
837 
831 
700 
517 
517 
443 

1131980 

78 

77 

76 

75 

74 

73 

72 

71 

1131979.2 

79.2 
78.7 

76.3 

76.2 

74.2 
71.9 
71.9 
71.1 

0 

. 99999871 
663 
461 
209 
208 
111 
99998641 
509 

-10000 II 
.99999955 
855 
743 
586 
586 
523 
194 
097 

528576 

75 

74 

73 

72 

71 

70 

69 

528576.3 

75.2 

74.1 
72.8 

72.8 

72.3 

69.8 

69.1 


EXPON'EN’TIi- 

scRvivr^g 


48.4 

48.1 

46.2 
46.1 
43.9 


1131980.6 

80.6 

80.4 

79.1 

79 

77. 

75. 

75, 

74. 


528576.7 

76.2 

75.6 

74.8 

74.8 
74.4 

72.7 

72.2 
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0-2000 III 


TIME 

OBSV. 

fail 

RELIABILITY 

287 

,99999935 

295 

871 ' 

. 299 

806 

306 

742 

333 

677 

341 

613 

352 

549 

412 

484 

423 

.420 

437 

355 

465 

291 

531 

227 

573 

162 

646 

098 

926 

033 

1000 

.99998969 

1014 

905 

1020 

840 

1117 

776 

1162 

711 

1245 

647 

1277 

583 

1620 

518 

1630 

454 

1698 

389 

1853 

325 


WEIBULL 

:^^t.TASILTTY 

.99999770 

762 

758 

749 

725 

717 

706 

644 

632 

617 

587 

515 

468 

385 

048 

.99993955 ' 
937 
929 
804 
746 
636 
593 
122 
108 
012 

.99997789 


EXPONENTIAL 

PPLIA5ILITY 

.99999759 

753 

749 

743 

721 

714 

705 

655 

645 

634 . 
610 
555 
520 
459 
224 
162 
151 
145 
064 
027 

.99998957 

930 

643 

635 
578 
448 


WEIBULL EXPONENTIAL 

.survivors survivors^ 


OBSV. 

g;URV IVORS 

1552648 

47 

46 

45 

44 

43 

42 

41 

40 

39 

38 

37 

36 

35 

34 

33 

32 

31 

30 

29 

28 

27 

26 

25 

24 

23 


1552645.4 

45.3 

45.2 

45.1 
44.7 

44.6 

44.4 

43.4 

43.2 
43 

42.6 

41.4 

40.7 

39.4 

34.2 

32.7 

32.5 

32.3 

30.4 

29.5 

27.8 

27.1 

19.8 

19.6 

18.1 

14.6 


1552645.2 

45.1 

45.1 

45 

44,6 

44.5 

44.4 

43.6 

43.5 

43.3 

42.9 
42 

41.5 

40.6 

36.9 
36 

35.8 

35.7 

34.4 

33.8 

32.8 
32.3 

27.9 

27.8 

26.9 

24.9 





I 


i 


I 
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0-5000 III 


FAIL 

TIME 

295 

299 

306 

341 

352 

313 

423 

437 

465 

573 

646 

926 

1000 

1014 

1020 

1117 

1162 

1277 

1620 

1630 

1698 

1853 

2032 

2065 

2333 

2750 

2799 

2843 

2910 

2971 

3021 

3032 

3055 

3116 

3146 

3426 

3430 

3561 

3658 

4015 

4218 

4342 

4545 

4736 


OBSV. 

RELIABILITY 

WEIBULL 

RELIABILI' 

.99999911 

.99999691 

823 

687 

734 

680 

646 

644 

558 

633 

469 

612 

381 

562 

293 

548 

204 

521 

116 

414 

028 

342 

.99998939 

069 

851 

.99998998 

763 

984 

674 

978 

586 

885 

498 

841 

409 

731 

321 

404 

233 

395 

144 

330 

056 

184 

*99998968 

015 

879 

.99997984 

791 

732 

703 

343 

610 

297 

526 

256 

438 

194 

349 

137 

261 

091 

173 

080 

084 

059 

99996996 

002 

908 

.99996975 

819 

716 

731 

712 

643 

591 

554 

501 

466 

173 

378 

.99995987 

289 

873 

201 

687 

113 

513 


EXPONENTIAL OBSV. 
reliability survivors 


.99999753 

1131980 

749 

79 

743 

78 

714 

77 

705 

76 

687 

75 

645 

74 

634 

73 

610 

72 

520 

71 

459 

70 

224 

69 

162 

68 

151 

67 

145 

66 

064 

65 

027 

64 

.99998930 

63 

643 

62 

635 

61 

578 

60 

448 

59 

298 

58 

271 

57 

0^6 

56 

.99997697 

55 

656 

54 

619 

53 

563 

52 

512 

51 

470 

50 

461 

49 

442 

48 

391 

47 

365 

46 

131 

45 

128 

44 

018 

43 

*99996937 

42 

383 

41 

468 

40 

364 

39 

194 

38 

034 

37 


WEIBULL 

SURVIVORS 

1131977.5 

77.4 

77.3 
76.9 

76.8 
76.6 
76 

75.8 

75.5 

74.3 

73.5 

70.4 

69.6 

69.5 
69.4 

68.3 

67.8 

66.6 

62.9 
62.8 
62.1 

60.4 

58.5 

58.1 

55.3 

50.9 

50.4 

49.9 

49.2 

48.5 
48 

47.9 

47.7 
47 

46.7 

43.8 
43.7 

42.4 

41.4 

37.6 

35.5 

34.2 

32.1 

30.2 


EXPONENTIA 

SURVIVORS 

1131978.2 

78.1 

78.1 

77.7 
77.6 

77.4 
76.9 

76.8 

76.5 

75.5 

74.8 

72.2 

71.5 
71 
71, 

70, 

69, 

68 . 

65, 

65. 

64. 

63. 

61. 

61. 

58. 

54. 

54. 

54 
53. 

52. 

52. 

52. 

52 

51.4 

51.1 

48.5 

48.5 

47.2 

46.3 
42.9 
41 

39.8 

37.5 
36.1 


i 


i 

i 
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0-10000 III 


FAIL 

OBSV. 

WEIBULL 

EXPONENTIAL 

OBSV. 

WEIBULL 

EXP0NEN7I, 

TIME 

RELIABILITY 

RELIABILITY 

RELIABILITY 

SURVIVORS 

SURVIVORS 

SURVIVORS 

295 

.99999810 

.99999545 

.99999804 

528576 

528574.5 

528575.9 

299 

621 

540 

802 

75 

74.5 

75.9 

341 

432 

486 

774 

74 

74.2 

75.8 

423 

T43 

383 

719 

73 

73.7 

75.5 

437 

054 

366 

710 

72 

73.6 

75.4 

646 

.99998864 

118 

572 

71 

72.3 

74.7 

952 

675 

.99998777 

369 

70 

70.5 

73.6 

1020 

486 

703 

324 

69 

70.1 

73.4 

1117 

297 

600 

260 

68 

69.6 

73 

1277 

108 

432 

154 

67 

68.7 

72.5 

1620 

.99997918 

083 

.99998927 

66 

66.8 

71.3 

1630 

729 

073 

920 

65 

66.8 

71*2 

1698 

540 

006 

875 

64 

66.4 

71 

1853 

351 

.99997853 

773 

63 

65.6 

70.5 

2065 

162 

647 

632 

62 

64.5 

69.7 

2729 

.99996973 

022 

192 

61 

61.2 

67.4 

2730 

838 

021 

192 

60 

61.2 

67.4 

2786 

594 

.99996970 

155 

59 

60.9 

67.2 

2843 

405 

917 

117 

58 

60.7 

67 

3021 

216 

755 

.99997999 

57 

59.8 

66.4 

3032 

027 

745 

992 

56 

59.7 

66.3 

3149 

.99995837 

639 

914 

55 

59.2 

65.9 

3426 

648 

391 

731 

54 

57.9 

65 

3430 

459 

387 

728 

53 

57.9 

65 

4218 

270 

.99995698 

207 

52 

54.2 

62.2 

4342 

081 

591 

124 

51 

53.6 

61.8 

4736 

.99994891 

255 

.99996864 

50 

51.9 

60.4 

5405 

702 

.99994695 

421 

49 

48.9 

58.1 

6269 

513 

.99993987 

.99995849 

48 

45.2 

55 

7444 

324 

047 

071 

47 

40.2 

50.9 

7709 

135 

.99992838 

.99994895 

46 

39.1 

50.0 

8645 

.99993946 

110 

275 

45 

35.2 

46 . 7 

9542 

756 

.99991424 

.99993681 

44 

31.6 

43.6 

9610 

567 

372 

636 

43 

31.4 

43.3 

9659 

378 

335 

604 

42 

31.2 

43.1 
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APPENDIX C.2 Proportion Breakdown of Micro- ... 
circuit Types by Vendor 


The 18 microcircuit types that were previously analyzed for pdfs of 
time to failure and which determined a study population of 1552649 
relate to the vendor as follows: 


VENDOR 


“ 2 

3 

4 

5 
25 
78 


PROPORTION BREAKDOWN BY VENDOR 


Proportion of 
Population 


■ Qty of Intermittent 
and Potentially 
Intermittent 
Failures 


Proportion of • 
Intermittent ari 
Potentially 
Intermit;tent 
Failures 


.2733 

.2933 

.3653 

.04 

.0166 

.0075 

.004 


8 

3 - 

2 - 
0 
26 
0 
0 


.205 

.077 

.051 

0 

.667 

0 

0 
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Total Qty 
in 

Population 



Proportion 
by Vendor 


Vendor 



Microcirdult 

1 

272492 

.1755 

2 

1055481 

.6797 

3 

59011 

.038 

4 

69646 

> 1 

.045 

1 


Failures 
by Vendor 
0-2000 hr. 


Failures 
by Vendor 
2000-5000 hr. 


Failures 
by Vendor 
5000-10000 hr 



The remaining 14 part types had no failures of an Intermittent and Potentially Intermittent nature 
and made up 6.19% of the population. 

































APPENDIX C.3 

Quantity of Digital Microcircuits per Chassis 


IC Per Single 
Chassis 


Power Supply 


Total IC 
Population 


Input Output Controller 
Input Output Adapter 


432480 


57737 


Central Processing Unit 


588938 


Core Memory 


412368 


Film Memory 


59754 


TOTALS 


1552649 


Grid of Microcircuit Failure by Chassis Function 


Power Supply 

Input Output Controller 

Input Output Adapter 


Central Processor Unit 


Core Memory 


Intermittent 


Potentially 

Intermittent 


Film Memory 
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